Predicting streamflow in Peninsular Malaysia using support vector machine and deep learning algorithms

Floods and droughts are environmental phenomena that occur in Peninsular Malaysia due to extreme values of streamflow (SF). Due to this, the study of SF prediction is highly significant for the purpose of municipal and environmental damage mitigation. In the present study, machine learning (ML) models based on the support vector machine (SVM), artificial neural network (ANN), and long short-term memory (LSTM), are tested and developed to predict SF for 11 different rivers throughout Peninsular Malaysia. SF data sets for the rivers were collected from the Malaysian Department of Irrigation and Drainage. The main objective of the present study is to propose a universal model that is most capable of predicting SFs for rivers within Peninsular Malaysia. Based on the findings, the ANN3 model which was developed using the ANN algorithm and input scenario 3 (inputs consisting of previous 3 days SF) is deduced as the best overall ML model for SF prediction as it outperformed all the other models in 4 out of 11 of the tested data sets; and obtained among the highest average RMs with a score of 3.27, hence indicating that the model is very adaptable and reliable in accurately predicting SF based on different data sets and river case studies. Therefore, the ANN3 model is proposed as a universal model for SF prediction within Peninsular Malaysia.

Floods and droughts are natural phenomena that have impacted regions within Peninsular Malaysia throughout recorded history. Recently, continuous heavy rainfall in January 2021 caused high streamflow (SF) within rivers and consequent widespread flooding in Peninsular Malaysia, with the state of Pahang representing the worst affected state. Approximately 50,000 individuals were evacuated, while at least six people died. Meanwhile, the worst water shortage affecting Peninsular Malaysia occurred back in 1998 when a prolonged drought caused very low amounts of SF and the drying up of dam reservoir water resources. Given the shortage, water was rationed for almost 150 days in the Klang Valley, affecting 3.2 million people. Ultimately, these phenomena can be understood to be a result of extreme values of SF 1 . Too high amounts of SF cause the stream to exceed its confinement and submerge surrounding land, causing floods. On the other hand, droughts are a result of too low amounts of SF which causes diminishing water resources as rivers and dam reservoirs dry up simultaneously. SF is even recognized by the World Meteorological Organization (WMO) as a significant predictor of droughts and has been used in existing studies to forecast drought indicators namely the standardized drought index (SDI) and standardized SF index (SSI) 2,3 . As history has shown, floods and droughts make the task of water resource management and allocation extremely difficult, while also affecting other industries and activities such as hydropower generation, agriculture, and environmental protection 1,4-6 . Additionally, existing studies have also demonstrated the correlation of SF with river suspended sediment load (SSL). SF data has been used to obtain better predictions of SSL 7-10 , hence highlighting the effects of SF on SSL, with higher amounts of SF typically causing higher SSLs. On top of that, streamflow also has an effect on the capacity of rivers to receive pollution. The water quality index (WQI) is commonly used to describe the water quality of streamflow and is affected by www.nature.com/scientificreports/ based on different rivers. Certain ML models or algorithms may excel in predicting SF accurately for a particular river but perform poorly in predicting SF for a different river, as they may be unable to effectively capture the behaviour of SF for the different river. Existing studies in Peninsular Malaysia have developed ML algorithms namely LR, M5P tree, RF, SVM, ANFIS, ARIMA, ANN, and LSTM to predict SF in rivers such as Sungai Muda in Kedah; Sungai Kuantan and Sungai Kenau in Pahang; Sungai Kelantan in Kelantan; and Sungai Kurau, Sungai Bernam, and Sungai Tualang in Perak 26,[29][30][31]42,53 . Aside from the studies by Zaini et al. 30 , Sammen et al. 31 , and Pandhiani et al. 53 which utilized data sets from two hydrological stations or rivers to develop SF prediction models, other SF prediction studies in Peninsular Malaysia have focused on data sets from only one hydrological station or river. This brings up a research gap in which it is unknown whether there exists a single ML model or algorithm that has the ability of accurately predicting SF for the many different rivers within Peninsular Malaysia, as there are no existing studies that have developed and tested ML models or algorithms based on data sets from a substantial number of rivers within the region. Therefore, the present study intends to undertake this research gap by developing SF prediction models based on SF time series data sets of hydrological stations located along 11 different rivers throughout Peninsular Malaysia. The ML algorithms utilized for SF prediction in the present study are the SVM, ANN, and LSTM. This is because the conducted literature review has shown them to produce accurate SF predictions as well as outperforming other ML algorithms in the field of SF prediction, hence indicating their superiority in this field. Additionally, the literature review performed has highlighted the algorithms' noteworthy advantages which make them suitable to be used for SF prediction in the present study. Hybridization of SVM, ANN, and LSTM is not investigated in the present study, as the present study intends to identify the standalone ML model that is most accurate and suitable as a universal model for the case study of 11 different river streamflow data sets in Peninsular Malaysia, which has not been performed before in existing studies. The findings of the present study may then open up a topic or focus for a future study on the hybridization of the standalone universal model proposed at the end of the present study.
Real-life adoption and application of an ML model proposed from scientific literature for the purpose of SF prediction may be complicated due to doubt on whether the proposed ML model is able to reproduce its accurate performance for different river case studies, which may have different SF magnitudes and behaviours due to variability on a spatial and temporal scale, as well as varying heterogeneity in water balance components. Meanwhile, the development of individual or personalized SF predictive ML models for each river within a region is resource intensive as it may require a significant amount of time and cost. Rather than using up lots of resources to develop many tailor-made SF predictive ML models for each river within a region, it would be more resource-friendly to identify one ML model that is capable of predicting SF with good accuracy for many different rivers within a region. Therefore, the present study was motivated by the idea of proposing a single universal ML model that has been substantially and simultaneously tested on different rivers; and is capable of accurately predicting SF for any river case study within Peninsular Malaysia. The main contribution of the present study is the testing and development of SF prediction models using 3 ML algorithms and SF data sets of hydrological stations from 11 different rivers throughout Peninsular Malaysia; and the proposal of the best performing ML model in the present study as the universal model for accurate SF prediction in the region. The best performing ML model is selected by considering two factors, which are the number of times a model produced the most accurate predictive performance for a data set, and the reliability of each model in producing relatively highaccuracy predictions for the different data sets. The accuracy of the ML models in the present study is quantified through the utilization of selected performance evaluation measures, namely mean absolute error (MAE), root mean squared error (RMSE) coefficient of determination (R 2 ) and ranking mean (RM). The findings from the present study may interest hydrological authorities or institutions that are searching for substantially tested ML models within Peninsular Malaysia, or even other regions. The rest of the present study is organized as follows: "Materials and methods" describes the materials and methods used to develop and test the SF prediction models. Section "Results and discussion" reports and discusses the performance of the SF prediction models. Section "Conclusion" concludes the overall study and provides suggestions for future studies.

Materials and methods
The materials and methods used in the developing and testing of SF predictions models for the 11 selected rivers within Peninsular Malaysia are explained in this section. Information on the location and data of case study, model development process, feature selection; data pre-processing; ML algorithms; and performance measures are described.
Location and data of case study. The western region of Malaysia is known as Peninsular Malaysia. It comprises of 13 states and 2 federal territories; and has an area of approximately 132,265 km 2 . Located just North of the equator, Peninsular Malaysia consists of 40% of Malaysian land. Malaysia's capital is the Federal Territory of Kuala Lumpur, which is located about 40 km from the coast. There are approximately 1235 river basins in Peninsular Malaysia, of which 74 are classified as main river basins while the remaining 1161 are categorized as small river basins 55 . The longest river in Peninsular Malaysia is Sungai Pahang, measuring up to 459 km in length.
The raw daily average SF data for different rivers within 11 states in Peninsular Malaysia was obtained from the Water Resources Management and Hydrology Division of the Malaysian Department of Irrigation and Drainage. To conduct the present study, one river is selected per state based on suitability of data in terms of volume and time series continuity; and the significance of the river to their respective state or federal territory. Table 1 provides information on the selected rivers for each state, the SF station numbers as well as latitudes and longitudes, and the data duration provided by each SF station. www.nature.com/scientificreports/ Model development process. The processes used to develop and test the SF prediction models in the present study comprises of raw data collection, feature selection, data pre-processing, model prediction, and performance analysis. The model development process employed in the present study is illustrated in Fig. 1.
Feature selection. The process of selecting input parameters to be fed to an algorithm for model training is known as feature selection. It is important as a means to identify input parameter combinations that would enable accurate model predictions. For the present study, only the daily average streamflow (SF) data was available and utilized to predict future SF, hence the present study is categorized as univariate. A statistical analysis on the daily average SF for each of the 11 selected rivers is shown in Table 2.
Given that the present study is univariate and two of the algorithms to be tested (SVM and ANN) are not traditional time-series forecasting algorithms, the SF data sets for each river are organized into sliding windows in order to reframe the time-series forecasting problem into a supervised learning problem. Before the data sets were organized into sliding windows, partial autocorrelation function (PACF) analyses were carried out on all the SF data sets in order to identify the lagged SF data that have significant correlation to the current-day SF data. Based on Fig. 2, it is found that for many of the SF data sets, the lagged SFs that are significantly correlated to the current-day SF [SF(t)] are the 1-day lagged SF [SF(t − 1)], 2-day lagged SF [SF(t − 2)], and 3-day lagged SF [SF(t − 3)].
In addition, the Pearson's correlation coefficient is utilized to further analyse and understand the correlation between the current-day SF data [SF(t)] and the selected lagged SF data [SF(t − 1), SF(t − 2), SF(t − 3)]. The mathematical formula used to calculate Pearson's correlation coefficient, symbolized by r xy , is represented by: where x,y are respective data means; x i , y i are individual respective data points; and n is the sample size.
Through the calculation of Pearson's correlation coefficient, it is found that there is indeed strong correlation between current-day SF data [SF(t)] and the selected lagged SFs [SF(t − 1), SF(t − 2), SF(t − 3)] in majority of the data sets. Table 3 shows Pearson's correlation coefficient matrix for all 11 SF data sets used in the present study.
The PACF and Pearson's correlation coefficient analyses show that the selected lagged SF data [SF(t − 1), SF(t − 2), SF(t − 3)] have strong predictive powers in predicting the current-day SF data [SF(t)], hence they are selected to be used as input parameters in the present study. Using these input parameters, three input parameter scenarios are designed and fed to the selected ML algorithms for model training. By feeding and testing different input parameter scenarios to the ML algorithms for model training as performed by existing studies 4,6,15,18,34,43,56 , the sensitivity of the models to different input combinations is able to be analysed and understood; and the best input parameter combination for accurate SF predictions can be determined. Table 4 describes the input parameter scenarios used in the present study. In total, 99 models were run and evaluated, given 3 input parameter scenarios, 3 ML algorithms, and 11 different SF data sets.
Data pre-processing. This section explains the pre-processing steps performed on the raw SF time-series data sets of the 11 selected rivers obtained from the Malaysian Department of Irrigation and Drainage. The data pre-processing steps comprise of the imputation of missing data, data partitioning, and feature scaling.
Missing data. Machine learning algorithms generate errors when missing values are encountered within a data set. For this reason, the raw SF time-series data sets obtained from the Malaysian Department of Irrigation and Drainage needed to be processed as they contained missing SF values. In existing SF studies, missing data has been imputed by interpolation or filling in the measing values with mean or average; or by removing the missing data rows completely 12,26,27,54 . In the present study, imputation through interpolation is utilized to fill in www.nature.com/scientificreports/  www.nature.com/scientificreports/ the missing data. The imputation is carried out using the imputeTS R-package developed by Moritz and Bartz-Beielstein 57 . Linear interpolation and spline interpolation were tested to occupy the missing data sections. It was found that spline interpolation filled in some missing SF data with negative values, which is not logical as the water in the rivers move in only one direction. Therefore, linear interpolation was selected to inhabit the missing data portions. As a sample, the outcome of the imputation process for missing SF values in the Johor data set is shown in Fig. 3.

Pearson's correlation coefficient matrix based on Sungai Kepis, Negeri Sembilan data set
Pearson's correlation coefficient matrix based on Sungai Pahang, Pahang data set Pearson's correlation coefficient matrix based on Sungai Perak, Perak data set www.nature.com/scientificreports/ Data partitioning. The SF data sets in the present study are partitioned into two subsets, which are the training set and the test set. The training set is to be used for developing and providing the ML models with the ability to make SF predictions, while the test set is used for the evaluation of the ML models' predictive ability using selected performance measures. An optimum ratio for the amount of training data to testing data is found to be 80:20, according to Kannangara et al. 58 . Existing SF prediction studies have also demonstrated good results using an 80:20 ratio for the amount of training data to testing data 6,26 . Therefore, 80% of each river's SF data is used for training while the remaining 20% is used for testing in the present study. The training data is further split into a training set and a validation set. The validation set has the purpose of fine-tuning the model after each epoch, hence improving the model performance. The size of the validation set was selected through a trial-and-error process, in which it was found that using 20% of the training data as the validation set produced the best results for SF prediction. The duration of the training and testing set for each river after data partitioning can be seen in Table 5. Table 3. Pearson's correlation coefficient matrix for data sets of each selected river. Table 4. Input parameter scenarios designed for the present study.

Output parameter Input parameter scenario Input parameter(s) Description
When SF data of previous day is available When SF data of previous 2 days is available When SF data of previous 3 days is available www.nature.com/scientificreports/ Feature scaling. As SVM and the deep learning algorithms (ANN and LSTM) are sensitive to data scales, feature scaling needs to be carried out on the SF data sets of each river. Feature scaling ensures that data variables are weighted accurately, so that convergence is fast and errors are minimized during training 43 . Depending on the ML algorithm to be used, two types of feature scaling methods are utilized, namely normalization and standardization. The present study utilizes standardization before training the SVM models, and normalization before training the deep learning models. Feature scaling is performed on the input data, which is determined through feature selection processes to be the 1-day, 2-day, and 3-day lagged SF; and the output data, which is the current-day SF. The outputs or raw predictions from the ML models are then inverse transformed back into their original scales in order to correctly proceed with evaluation and comparison through the usage of selected performance measures.

Machine learning algorithms.
In the present study, established ML algorithms in the field namely SVM and two deep learning algorithms: ANN and LSTM, were selected for development and testing of SF prediction models. SVM, ANN, and LSTM are regarded as established in the field of SF prediction due to the numerous studies demonstrating their effectiveness in recent years 1,6,[13][14][15][16][17][18][25][26][27][28][29][30][31][32][33][34][41][42][43][44]59 . The Python programming language was utilized in the development and testing of the SF prediction models due to ease in commanding and comprehending the language, as well as its vast library support. Table 6 details the experimental setup used in developing the SF prediction models.

Support vector machine (SVM).
The SVM is a kernel-based algorithm that utilizes structural risk reduction and statistical learning methods in order to produce a good generalization capacity through the minimization of generalization error in contrast to training error 1,13,17 . SVM works by using a transfer function to non-linearly map input vectors into a high dimensional feature space, which helps to reduce the complexity of optimization 13,17 . The inspiration behind the SVR technique is the definition of a regression function approximation based on a set of support vectors originating from a training data set 1 . According to existing studies 1,17 , the SVM function is given by: is the kernel function inside the multiplier, and b i is bias. The kernel function represents the main SVR hyperparameter that requires to be selected or tuned before running the SVR models. The kernel functions that can be employed are the radial basis function (RBF), linear, polynomial, and sigmoid. Existing literature has backed RBF as the best kernel function due its optimization efficiency and adaptability 1,13 . After trial and error, it was indeed determined that RBF produced the best SF predictions, hence it was chosen and finalized as the SVR kernel function in the present study. All other unmentioned SVR hyperparameters were remained as their default values as satisfactory SF predictions were obtained. Table 7 shows the hyperparameter tuning for SVR in the present study.
Artificial neural network (ANN). The ANN is a deep learning algorithm invented based on the neural connections that occur in the biological functions of the human brain 33 . This algorithm essentially comprises of three layers, which are the input layer, hidden layer, and output layer 26,27,33 . The ANN architecture consists of processing units called neurons, also referred to as nodes 26 . The ANN layers and nodes are connected together by connections referred to as weights 26,27 . These weights provide the ANN with a high degree of flexibility, giving it the ability to freely adapt to input data 27 . The number of ANN layers and nodes required to solve a prediction problem typically depends on the complexity of the problem, with more difficult problems usually requiring more layers or nodes. An ANN architecture is essentially characterized by the work of a training algorithm to represent the layers, nodes, and connections; connection weights between each neuron; and an activation function 26 . The training algorithms works to reduce errors through the adjustment of connection weights and biases within an ANN architecture. The adjusted connection weights are then taken and multiplied with the input values, which are then added with the adjusted biases. Finally, the outputs are sent to the activation function to generate the final output, which in the present study is SF prediction. As explained by Zakaria et al. 26 , the ANN mathematical model can be described by equation: where y i is the output variable, N is the number of neurons, ω ij is the weight connecting the jth neuron and the ith neuron, x i is the input vector, b j is the bias of the jth neuron, and f is the activation function.
As explained by Zamanisabzi et al. 33 , trial-and-error is needed to determine the best hyperparameter tuning for an ANN architecture, as different problems have different hidden relationships within the data. After performing the trial-and-errors, it was determined that two hidden layers with 6 neurons in each layer was optimal for SF prediction in the present study as it provided good adaptability in producing SF predictions for the 11 different river data sets. In addition, different number of epochs, training algorithms, activation functions, and batch numbers were tested to discover the best possible ANN architecture within the context of the present study. Through the testing, the best ANN architecture was found and is shown in Table 8. All other unmentioned  During each of the ANN models' training process, the train and validation loss vs epochs graphs are produced to graphically verify that the losses reduce and converge, and to ensure that overfitting does not occur. As a sample, the losses vs epochs graph for the best performing ANN model (ANN3) for the Johor data set is shown in Fig. 4. It can be seen that the validation loss is lesser than the train loss. This is because of the small size of the validation set, which comprises of 20% of the training set. The size of the validation set can be increased to reduce the train loss; however, it was found that the best SF predictions were obtained with the training data to validation data ratio set at 80:20. Therefore, this ratio was maintained and utilized in training the ANN models.
Long short-term memory (LSTM). The LSTM is an advanced version of the recurrent neural network (RNN) that helps to overcome the issues of gradient vanishing and explosion that are present in the standalone RNN 44 . This algorithm utilizes control gates to essentially store, remove, update, and control the flow of information in a unique structure known as the memory cell 43,44 . There are three types of control gates used by the LSTM, which are the input gate, the output gate, and the control gate [42][43][44] . The input gate functions to control the flow of information to be introduced into the cell state, the output gate selects information from the cell state to be forwarded to a dense layer containing a single neuron where the final output value is calculated, while the forget gate determines the amount of information to be removed from the previous cell state 43,44 . The operation of the control gates helps in filtering relevant information as required, hence contributing towards the minimization of errors. As mentioned by existing studies 43,44 , the LSTM mathematical model can be described through function: where h t is the output, o t is the output gate, ⊙ is the Hadamard product, and C t is the cell status value at time t.
As is the case with ANNs, LSTMs also consist of hidden layers filled with neurons, hence a trial-and-error process is needed to find the optimal number of hidden layers and neurons. After performing the trial-and-errors, it was determined that two hidden layers with 50 neurons in each layer was optimal for SF prediction in the present study as it provided good adaptability in producing SF predictions for the 11 different river data sets. In addition, different number of epochs, step numbers, training algorithm, dropout regularization on each hidden layer, activation function, recurrent activation function, and batch numbers, were tested to discover the best possible LSTM architecture within the context of the present study. Through the testing, the best LSTM architecture was found and is shown in Table 9. All other unmentioned LSTM hyperparameters including initializer, regularizer, and constraints, were remained as their default values as satisfactory SF predictions were obtained.
During each of the LSTM models' training process, the train and validation loss vs epochs graphs are produced to graphically verify that the losses reduce and converge, and to ensure that overfitting does not occur. As a sample, the losses vs epochs graph for the best performing LSTM model (LSTM2) for the Johor data set is shown in Fig. 5. It can be seen that the validation loss is lesser than the train loss, similar to Fig. 4. This is because of the small size of the validation set, which comprises of 20% of the training set. The size of the validation set can be increased to reduce the train loss; however, it was found that the best SF predictions were obtained with the training data to validation data ratio set at 80:20. Therefore, this ratio was maintained and utilized in training the ANN models. Additionally, the higher train loss may be due to the dropout regularization applied in  where y i is the real value, y i is the predicted value, and n is the sample size. where y i is the real value, y i is the predicted value, and n is the sample size.

Coefficient of determination (R 2 ). The R 2 computes the correlation between real values and predicted values,
with the range of R 2 scores between − 1 and 1. An R 2 closer to 1 signals a high correlation between real and predicted values. R 2 scores are unitless. The following equation is used to calculate R 2 : where y i is real value, y i is predicted value, y i is the mean of y i , and n is sample size.

Ranking mean (RM).
To compute the RM, each model is first ranked based on the scores of the selected performance measures, which are MAE, RMSE, and R 2 in the present study. Each models' RM is then calculated by obtaining the average of their ranks respective to their MAE, RMSE and R 2 scores. A higher RM signals a better overall performance of a model compared to the other models. RM is defined by: where n is the number of performance evaluation measures used, which is 3.

Results and discussion
This section presents and discusses the performances of the developed models for SSL prediction. A comparison and analysis is then made based on the model performances.
Performance of models based on the Sungai Johor, Johor data set. The best overall performance in predicting SF for the Sungai Johor, Johor data set was produced by model ANN3, which is based on the ANN algorithm and input parameter scenario 3. ANN3 outperformed the other models with MAE, RMSE, and R 2 scores of 4.7235 m 3 /s, 10.0746 m 3 /s, and 0.9443 respectively, hence obtaining the highest RM with a score of 1.00. SVR2 was the best SVR model (RM = 4.00), while LSTM2 was the best LSTM model (RM = 7.00). The models' performance scores and actual vs predicted SF of best models based on each algorithm for the Sungai Johor test set are shown in Table 10 and Fig. 6 respectively.
Performance of models based on the Sungai Muda, Kedah data set. Model SVR3, based on the SVR algorithm and input parameter scenario 3, produced the best overall performance in predicting SF for the Sungai Muda, Kedah data set. SVR3 significantly outperformed the other models in terms of MAE with a score of 12.3853 m 3 /s, hence obtaining the best RM with a score of 1.67. ANN2 achieved the best RMSE and R 2 with scores of 29.6536 m 3 /s and 0.8911 respectively. ANN2 was the best ANN model (RM = 2.67), while LSTM1 was the best LSTM model (RM = 7.00). The models' performance scores and actual vs predicted SF of best models from each algorithm for the Sungai Muda test set are shown in Table 11 and Fig. 7 respectively. www.nature.com/scientificreports/   Table 12 and Fig. 8 respectively.
Performance of models based on the Sungai Melaka, Melaka data set. The best overall performance in predicting SF for the Sungai Melaka, Melaka data set was produced by model ANN1, which is based on the ANN algorithm and input parameter scenario 1. ANN1 outperformed the other models with MAE, RMSE, and R 2 scores of 2.7113 m 3 /s, 6.0824 m 3 /s, and 0.6809 respectively, hence obtaining the highest RM with a score of 1.00. SVR1 was the best SVR model (RM = 3.67), while LSTM1 was the best LSTM model (RM = 7.67). The models' performance scores and actual vs predicted SF of best models based on each algorithm for the Sungai Melaka test set are shown in Table 13 and Fig. 9 respectively.
Performance of models based on the Sungai Kepis, Negeri Sembilan data set. The best overall performance in predicting SF for the Sungai Kepis, Negeri Sembilan data set was produced by model LSTM3, which is based on the LSTM algorithm and input parameter scenario 3. LSTM3 outperformed the other models with MAE, RMSE, and R 2 scores of 0.4969 m 3 /s, 2.6430 m 3 /s, and 0.0202 respectively, hence obtaining the highest RM with a score of 1.00. SVR1 and SVR2 were the joint-best SVR models (RM = 4.67), while ANN2 was the best ANN model (RM = 7.00). The models' performance scores and actual vs predicted SF of best models based on each algorithm for the Sungai Kepis test set are shown in Table 14 and Fig. 10 respectively.  Table 15 and Fig. 11 respectively.   www.nature.com/scientificreports/ Figure 10. Actual vs predicted SF of best models based on each algorithm for Sungai Kepis test set. Figure 11. Actual vs predicted SF of best models based on each algorithm for Sungai Pahang test set. www.nature.com/scientificreports/ Performance of models based on the Sungai Perak, Perak data set. The best overall performance in predicting SF for the Sungai Perak, Perak data set was produced by model ANN2, which is based on the ANN algorithm and input parameter scenario 2. ANN2 outperformed the other models with MAE, RMSE, and R 2 scores of 18.1337 m 3 /s, 29.3009 m 3 /s, and 0.8286 respectively, hence obtaining the highest RM with a score of 1.00. SVR2 was the best SVR model (RM = 4.33), while LSTM3 was the best LSTM model (RM = 7.00). The models' performance scores and actual vs predicted SF of best models based on each algorithm for the Sungai Perak test set are shown in Table 16 and Fig. 12 respectively.
Performance of models based on the Sungai Arau, Perlis data set. The best overall performance in predicting SF for the Sungai Arau, Perlis data set was produced by model ANN3, which is based on the ANN algorithm and input parameter scenario 3. ANN3 outperformed the other models with MAE, RMSE, and R 2 scores of 0.5441 m 3 /s, 1.4007 m 3 /s, and 0.6857 respectively, hence obtaining the highest RM with a score of 1.00. SVR1 was the best SVR model (RM = 4.00), while LSTM2 was the best LSTM model (RM = 7.00). The models' performance scores and actual vs predicted SF of best models based on each algorithm for the Sungai Arau test set are shown in Table 17 and Fig. 13 respectively.
Performance of models based on the Sungai Selangor, Selangor data set. The best overall performance in predicting SF for the Sungai Selangor, Selangor data set was produced by model ANN3, which is based on the ANN algorithm and input parameter scenario 3. ANN3 outperformed the other models with MAE, RMSE, and R 2 scores of 7.2175 m 3 /s, 13.9196 m 3 /s, and 0.8851 respectively, hence obtaining the highest RM with a score of 1.00. SVR1 was the best SVR model (RM = 4.67), while LSTM3 was the best LSTM model (RM = 7.00). The models' performance scores and actual vs predicted SF of best models based on each algorithm for the Sungai Selangor test set are shown in Table 18 and Fig. 14 respectively. www.nature.com/scientificreports/ Performance of models based on the Sungai Dungun, Terengganu data set. The best overall performance in predicting SF for the Sungai Dungun, Terengganu data set was produced by model ANN1, which is based on the ANN algorithm and input parameter scenario 1. ANN1 outperformed the other models with MAE, RMSE, and R 2 scores of 18.8022 m 3 /s, 51.8025 m 3 /s, and 0.8631 respectively, hence obtaining the highest RM with a score of 1.00. SVR1 was the best SVR model (RM = 4.00), while LSTM1 was the best LSTM model (RM = 7.00). The models' performance scores and actual vs predicted SF of best models based on each algorithm for the Sungai Dungun test set are shown in Table 19 and Fig. 15 respectively.  Figure 13. Actual vs predicted SF of best models based on each algorithm for Sungai Arau test set. www.nature.com/scientificreports/   Table 20 and Fig. 16 respectively.
Overall comparison and discussion of model performances. Two evaluations are considered in comparing and analysing the models' performances. The first evaluation is the number of times a model produced the best predictive performance for a data set, and the second evaluation is the reliability of each model in producing SF predictions of relatively high accuracy. In the present study, ANN3 produced the best predictive performance for 4 out of the 11 tested data sets (Sungai Johor, Sungai, Sungai Pahang, Sungai Arau, Sungai Selangor). Meanwhile, SVR3 was the most accurate model in 3 out of the 11 tested data sets (Sungai Muda, Sungai Kelantan, Sungai Klang); and ANN1 was the most accurate model in 2 out of the 11 tested data sets (Sungai Melaka, Sungai Dungun). Lastly, ANN2 and LSTM3 achieved the best SF predictions for one data set each, namely Sungai Perak and Sungai Kepis respectively. Overall, it is understood that ANN3 produced the most accurate SF predictive performances for more data sets in comparison to the other tested models. Additional analysis reveals that the algorithm and input scenario that produced the best SF predictive performance for the most data sets are the ANN and input scenario 3 respectively, as they produced the best SF predictions for 7 out of 11 data sets and 8 out of 11 data sets respectively. A matrix of most accurate algorithm and input scenario for each data set and the parameters with highest number of best prediction results can be observed in Tables 21 and 22 respectively. Next, the reliability of each model in producing relatively high-accuracy SF predictions based on different data sets is evaluated by calculating and comparing the average of the RM scores obtained by each model for all 11 tested data sets. This evaluation is significant to identify the predictive models that are most robust and most capable of adapting to different data sets which may vary in SF magnitude and behaviour, depending on spatial www.nature.com/scientificreports/   www.nature.com/scientificreports/ and temporal factors as well as the heterogeneity of water balance components. Based on Table 23 and Fig. 17, it is determined that ANN2 exhibits the highest average RM with a score of 3.21. This makes ANN2 the most reliable model in predicting SF with a relatively high accuracy for different data sets, in comparison to the other tested models. ANN3 produced the second-best average RM score (average RM = 3.27) which is very close to the ANN2 average RM score, while ANN1 produced the third-best average RM score (average RM = 3.79). Overall, it is found that the top three average RM scores were produced by the ANN models. The best model for SF prediction in the present study is then selected based on the findings with regards to the first evaluation which is the number of times a model produced the best predictive performance for a data set; and the second evaluation which is the reliability of each model in producing SF predictions of relatively high accuracy. For the first evaluation, Table 21 shows that ANN3 was the most accurate SF predictive model for 4 out of the 11 tested data sets, which is more than any of the other tested models. Through the second evaluation, it was found that ANN2 produced the best average RM as shown in Table 23 and Fig. 17, hence indicating that it was the most reliable model in producing relatively high-accuracy SF predictions. Therefore, the two evaluations utilized have proposed different best models, which are ANN2 and ANN3. To make a distinction of the best overall model in the present study, the performances of ANN2 and ANN3 are compared side by side to truly determine the most advantageous SF predictive model. With regards to the first evaluation, it can be seen in Table 21 that there is a clear and significant difference between the performance of ANN2 and ANN3, as ANN2 produced the best SF predictive performance for only 1 out of the 11 tested data sets while ANN3 managed to outperform the other models in 4 out of the 11 tested data sets. Meanwhile, the second evaluation shows that although ANN2 is superior compared to the other models, the difference between the average RMs of ANN2 and ANN3 is very small and negligible as can be seen in Table 23 and Fig. 17. Based on these analyses, ANN3 is selected and proposed as the universal ML model that is capable of predicting SF with high accuracy for rivers within the region of Peninsular Malaysia. Although ANN2 obtained the best average RM score, this model only produced the best predictive performance for 1 out of the 11 tested data sets which is significantly lesser compared to ANN3 which outperformed all the other models for 4 out of the 11 tested data sets, hence why ANN3 was selected as the best model. Table 21 and Fig. 17 highlight ANN as the most suitable and successful algorithm in the present study, while SVR is the second-best algorithm and LSTM is the poorest performing algorithm. The LSTM predictive performance was significantly poor compared to that of the ANN and SVR algorithms, as the LSTM was only able to outperform ANN and SVR for only one data set while exhibiting the poorest average RMs out of all the algorithms. The poor performance of LSTM in the present study is attributed to the volatility and lack of clear time pattern in the SF data sets, as LSTMs are generally effective in solving problems with clear time patterns. On the other hand, ANN and SVR performed better because they are regression-based methods which appears to be more suited for the current problem of predicting SF in Peninsular Malaysia.
The superiority of the ANN algorithm over the other algorithms in predicting SF may be attributed to the advantages of the ANN algorithm in general. In addition to being able to easily handle large data sets; detect complex non-linear relationships; and easily relate input and output parameters without the need for complex mathematical calculations, the ANN algorithm is also able to learn by itself and produce output or predictions that are not limited to the input provided to it. These advantages appear to have facilitated high-accuracy SF predictive performances by the ANN algorithm, as the ANN algorithm was able to produce the best SF predictive performance for the most data sets (7 out of 11 data sets) compared to the other algorithms. On top of that, it can be seen in Fig. 6 to Fig. 16 that the ANN algorithm predicts the extreme SF values or SF spikes more accurately compared to the other algorithms. Input scenario 3 is found to induce the most success when coupled with the ANN algorithm, as the ANN3 model outperformed all other models in 4 out of the 11 tested data sets while obtaining among the best average RM scores in the present study. This may be because input scenario 3 provides an optimum amount of useful historical SF input that can be used by the ANN algorithm to make accurate SF predictions, hence enabling the ANN3 model to produce highly accurate SF predictions and outperform the other SF predictive models in the present case study.
When compared to existing studies, the findings in the study by Ateeq-ur-Rauf 25 is agreeable with the findings in the present study, as the ANN algorithm outperforms the SVM algorithm. Additionally, other existing studies also point towards ANN as the superior ML algorithm for SF prediction when compared to other ML algorithms [26][27][28][29] . On the contrary, there are also existing studies that contradict the present study's findings, as they have shown the SVM and LSTM algorithms to perform better in predicting SF compared to the ANN algorithm 6,13,14,16,17,42,43 . This may be due to differences in the experimental setup relating to elements such as input and output parameters; forecast horizons; data set characteristics such as number of data sets and amount of data available for training and testing; study location; magnitude and behaviour of SF in selected river; and ML algorithm hyperparameter setup. In the present study, the SVM algorithm has indeed shown that it is capable of outperforming the ANN algorithm, as it predicted SF better in 3 out of the 11 tested data sets namely the Sungai Muda, Sungai Kelantan, and Sungai Klang data sets. However, the ANN algorithm is superior on an overall scale as it outperformed both the SVM and LSTM algorithms in the remaining 7 tested data sets while also obtaining better average RMs, as shown in Table 21 and Fig. 17. Therefore, it can be summed up that the ANN algorithm is the most accurate and effective ML algorithm for SF prediction when the present study's experimental setup is applied, which includes a univariate approach that uses lagged daily average SF to predict current daily average SF for 11 different data sets from rivers throughout Peninsular Malaysia. Although the ANN3 model has produced good SF predictive performance in the present study, it can still potentially be improved. Hybridization and usage of optimization algorithms to improve the selection of ML algorithms' hyperparameters may enhance prediction capability and accuracy. Rainfall data may also be obtained and utilized as an input parameter to improve SF predictive performance, given that rainfall has been shown in existing studies to have a correlation www.nature.com/scientificreports/ with SF 12,34,61 . These elements are yet to be investigated in the present study; hence they are suggested for future implementations.

Conclusion
In the present study, daily average SF time series data for 11 different rivers throughout Peninsular Malaysia were collected and utilized for the development of ML models that predict future SF. Three types of ML algorithms were used, namely SVM, ANN, and LSTM. The quantitative analyses show that the ANN3 model, which is based on the ANN algorithm and input scenario 3 (inputs comprising of previous 3 days SF data), represents the best performing model for SF prediction in the present study. ANN3 outperformed all the other tested model in predicting SF for the greatest number of data sets, which is 4 out of the 11 tested data sets. This model also exhibited among the best average RM scores, which indicates that it is highly reliable in producing accurate SF predictions for different data sets which may vary in terms of SF behaviour and magnitude. Additionally, it was found that the algorithm and input scenario that were most effective as model components in predicting SF were ANN and input scenario 3. The ANN algorithm produced the most accurate SF predictions for 7 out of the 11 tested data sets while the usage of input scenario 3 led to the best SF predictions for 8 out of the 11 tested data sets.
In conclusion, the present study set out to address the research gap in which a single ML model capable of accurately predicting SF for multiple different rivers within Peninsular Malaysia is yet to be developed and proposed, as majority of existing studies have focused on the development of SF predictive models based on only one data set or river case study. Therefore, this research gap has been addressed in the present study by developing and testing 99 ML models, based on different established ML algorithms, input scenarios, and SF data sets in Peninsular Malaysia; and proposing the best performing ML model as a universal model that is capable of predicting SF for rivers within the study region. Based on the findings, the present study proposes the ANN3 model as the universal model that is most capable of SF prediction for rivers within Peninsular Malaysia, hence the main objective of the present study is achieved. In hindsight, the findings from the present study are hoped to contribute towards the respective body of knowledge and aid organizations in mitigating the effects of environmental hazards, particularly droughts and floods, through effective and accurate SF predictions using ML models. Further improvement of the ANN3 model for SF prediction in Peninsular Malaysia can be considered as the focus or topic of future studies. Hybridization and utilization of optimization algorithms or more advanced techniques may be used with the ANN3 model to enhance the capability of identifying optimal hyperparameters, resulting in possibly improved accuracy of the model. Rainfall data may also be implemented as an input parameter to improve SF prediction.

Data availability
The data that support the findings of this study are available at the Malaysian Department of Irrigation and Drainage.